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The effect of genetic markers associated with IgA nephropathy on risk of disease in sub-phenotype and 
progression is uncertain. Data from 2096 Chinese patients were used to create both un-weighted (uw) and 
weighted (w) genetic risk score (GRS). The association between GRS with disease susceptibility and clinical 
parameters were assessed. All nine selected single nucleotide polymorphisms (SNPs) were associated with 
susceptibility to IgAN. uwGRS and wGRS showed a similar fit in disease associations. With every 1-unit 
increase in the uwGRS, the disease risk increased by approximately 20%; whereas every one standard 
deviation increase in the wGRS, disease risk increased by approximately 40% ~ 60%. Association between 
rs3803800 and serum IgA was replicated, and risk groups in GRSs were associated with increased IgA/IgAl 
levels. uwGRS9 & 16 was an independent predictor for end stage renal disease (ESRD) in IgAN, with a 
relative risk of 2.52 (p — 6.68 X 10~^). In conclusion, we observed that GRSs comprising nine SNPs 
identified in a GWAS of IgAN were strongly associated with susceptibility to IgAN. The high risk GRS9 
group had a high risk of ESRD in follow-up. 



IgA nephropathy (IgAN) is the most prevalent primary chronic glomerular disease worldwide. The clinical 
manifestation and progression of IgAN varies. The 20-year predicted survival without the need for dialysis was 
96% among patients with no risk factors versus 36% among those with three factors: urinary protein excretion 
of more than 1 g per day, hypertension (> 140/90 mm Hg), and severe histological lesions at the time of renal 
biopsy' Thus, risk prediction is vital for disease prevention and refining prediction strategies remains important 
for targeting treatment recommendations'* One area of potential improvement has been the discovery of genetic 
markers for IgAN, as well as intermediate phenotypes, such as proteinuria and blood pressure. 

Genetic factors undoubtedly influence the pathogenesis of IgAN, with an estimated heritabUity of 40%- 
50%'°'". Recent efforts using genome-wide association studies (GWASs) have identified genetic markers assoc- 
iated with IgAN'^"". In a study using a standardized seven- SNP genetic risk score (GRS), disease risk increased 
sharply with Eastward and Northward distance from Africa, which correlated with differences in disease pre- 
valence among world populations. In addition, it explained 4.7% of overall IgAN risk, and one standard deviation 
increase in the score was associated with nearly 50% increase in the odds of disease^. Thus, it strongly suggested 
that use of a multi-locus genetic risk score might be promising for prediction for disease susceptibility. As genetic 
backgrounds are stable, their presence may act over the entire life course'. However, it remains unknown whether 
the cumulative effects of variants identified by GWASs could benefit prediction of disease progression and 
treatment decisions'^". 

No best GRS model was recommended in the recent GRIPS Statement (recommendations for the reporting of 
Genetic Risk Prediction Studies)^" and it was widely observed that the count method (risk allele counts, the total 
number of risk alleles an individual carries, or unweighted GRS) showed similar discriminative accuracy, but less 
complication in weighting process, compared with the log odds procedure (sums of the natural logarithm of the 
allelic odds ratio for each risk allele within and across loci, or weighted GRS) for most diseases^'"^'. Therefore, 
based on data from GWASs, we constructed both weighted and unweighted genetic scores^'* We aimed to 
firstly construct models that were easy to interpret but were valid for risk prediction. Notably, a comparison with 
the pre-established seven- SNP GRS (a weighted score) was also conducted. The scores were then tested to assess 
their predictive ability in both disease/intermediate phenotype susceptibility and disease progression, using a 
Chinese Northern Han population. As the strongest association observed was with a subset of alleles encoding the 
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class II Human Leukocyte Antigens (HLA), whereas several non- 
HLA loci also demonstrated genetic associations, both HLA allele 
scores and non-HLA allele scores were constructed to evaluate their 
respective role in specific sub-phenotypes. 

Results 

Association between single SNP selected and susceptibility to IgAN. 

As can be seen from Table 1, all the SNPs selected for further GRS 
analysis were associated with susceptibility to IgAN. The top seven 
associated IgAN alleles were also the SNPs reported in the previous 
GWAS conducted in our cohort, as well as the seven SNPs selected in 
previous seven- SNP GRS among different populations in geospatial 
risk analysis^"'^. The association between the two novel SNPs selected 
from Southern Chinese Han GWAS and IgAN could also be repli- 
cated'^. Although they showed less significant p values for disease 
association in the current study, they conferred similar risk effect com- 
pared with a previous report from a Southern Chinese population'^. 
Odds ratio (OR) for rs2738048C and rs3803800G were 0.81 and 0.87 
in our cohort, compared with 0.79 and 0.83 respectively in the pre- 
vious report. Thus, the data implied that the associations between nine 
SNPs and IgAN were real and our current cohort could be a 
representative population for further risk stratification. 

Linkage disequilibrium (LD) analysis indicated that HLA variants 
rs9275224, rs2856717 and rs9275596 were in partial LD, with r^ 
ranging from 0.33 to 0.75; however, they were not in LD with two 
other HLA variants, rs9357155 and rsl883414 (r^ < 0.1). When the 
nine SNPs were included in a logistic model, they all showed signifi- 
cant associations with susceptibility to IgAN. Concordant with pre- 
vious reports^, conditional analysis indicated that all nine SNPs were 
independently associated SNPs. However, no gene-gene interactions 
were observed among the nine SNPs, including the interaction 
between the CFOTi/i^i (rs6677604) and the H0RMAD2 loci 
(rs2412971) (p = 0.41), reported in the previous seven-SNP genetic 



Individual association between single SNPs and clinical para- 
meters of IgAN. The individual association between the nine suscep- 
tibility SNPs with clinical phenotypes of IgAN were assessed in our 
cohort. We observed that the risk allele A of rs3803800 was associ- 
ated with an increased IgA (P = 3.91 X 10"^) level in sera, which was 
concordant with previous reports from Southern Chinese Han 
GWAS'^'^". The serum IgA concentrations (g/L, mean ± standard 
derivation) were 3.15 ± 1.21, 3.18 ± 1.19, and 3.55 ± 1.33 for 
rs3803800 GG, AG and AA, respectively (Figure 1). We also 
observed associations of rs2412971 with serum IgA and IgAl 
levels, rsl883414 with gross hematuria and hypertension (Table 2). 
Risk genotypes seemed to be associated with higher serum IgA or 
IgAl level, higher frequency of gross hematuria or higher frequency 
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Figure 1 | Associations between genotypes of rs3803800 with serum IgA 
level. 



SCIENTIFIC 



REPORTS 



I 4 : 4904 | DDI: 1 0. 1 038/srep04904 



2 



<1J 

O 



D 
X 



X 





— n K CN 


O 


CN 


K 


00 




cs 




o o 


o 




o 




— K O K CN 


00 




00 








X 


q q 


q 




q 


o 


CD <D CD G <D 


d 


d 


d 


d 


d 


d 




d ^ 




d 














































<f 












^ i\ ^ 














o 






o 






O 00 CN CO O 








<5 


a 




K 


CN O 


q 




q 


d 


CD <D CD CD <D 


d 


d 


d 


d 


d 


d 


d 


d — ^ 




d 




a 


ro CN O O 






00 


00 


CN 




CN 


o a 




o 


CM 




CS K 00 O 


K 




CM 


00 


CN 




CN 


CO 


oo 




CN 


d 


CD CD CD CD CD 


d 


d 


d 


d 


d 


d 


d 


d d 


d 


d 


d 



'g. 

a 
a. 
>. 

X 





CN ^ 


CM ^ CO 






CO 


K 


00 




o 


o 




o 






OO 00 


00 ^ CN 




o 


o 


CO 


a 


00 


C> 


O 00 




00 


CO 


d 


d d 


d d d 


d 


d 


d 


d 


d 


d 


d 


— ' d 


d 


d 


d 



c lo 
a) 00 



o n 



^OO^OCN X^OOO 

5 d ddddcMC^dd 

6 So 



o 
d 



^ o 
d d 



o m 

- CO 

X o - 
00 gi o 



^ c^ 
00 — 



K 

d 



o 
o 
d 



o 

Q. 

O _ 

-5 

c — 



D 
Q. 



< 

D3 



I 0) 

cs 
_0 

.D 



< 
CD 



< 
C33 



0 

o 

c ^ 

J- - 



CO 

C —I 

D 

</> 

Q_ 

z -E 

0 .E 

Jll o 

o ± 

o ^ 



O 00 



O K 

U-) 00 cs O ^ *? 
K O u-1 O X ■- 
CDCDCDCDm^CDCD 



00 O O K 
O O cs — U1 



00000 



• u-1 K cs CN K 

X O ^ ^ O K CS 

S CD CD CD CD CD 

t — 



00 



10 


0 


CO 




0 


cs 




CN 


d 


d 


d 


d 



o « 

T 00 
X ■- O K 

M oi d d 
n 



•O 



O « O M 

•- in •- CI 

X X 

o> e£ >o Bi 
t O "* O 



0 'St 


CO 


CN 00 


0 


K 


0 ^ 


K 


-0 


0 


cs K 




<3 K 


K 


C> 


K 00 


K 


00 


CO 


d d 


d 


d d 


d 


d 


d d 


d 


d 


d 





















■^0 -^0 0 0 0 0 0 0 

OQ *~ *~ -f^ *~ rt '*~ '-^ O' •*~ rt •*~ rt •*~ a *~ 



00 cs — ^ 

d d d CD d 



o 

ri' 



O 00 
00 o 

d ^ S o S d d 
■ ^ > ^ 

0 — M — 



-« — ■- • 



CS —I- ■ 



-o ■ 



00 a 
CN o 



X ^ X 



o§b§ 

O O ^ 
*- irt *- o 

CO — m — 



K 


CN CO 00 CO 


CM 


00 


CO 


CO 


0 


K 


0 






0 


K 


00 


K 


0 K 00 K — 


CO 


K 


CM 


CO 


00 




CN 


0 


00 


q 


CN 


cs 


d 


d d d d d 


d 


d 


d 


d 


d 


d 


d 


d 


d 




d 


d 





^ 0 


0 0 CM 


CM 0 


10 




•0 


0 


K 


00 


0 




CN 


CO 


^ 0 ^ 


K ^ 






CN 




0 




CN 




d 


d d 


d — ^ d 


d d 


d 


d 


d 


d 


d 


d 


d 


d 



I 00 

00 

- o ^ 

X I i>! 
00 o o 



o 
o 

K 
K 
<5 
O 



^ l\ O U-) -tj- 

CN ^ CS ^ 

CN K ^ ^ 

O K CO 

K K 00 

CN 00 CN CO 00 

CS CM CS CS ^ 

to VI wi to 



00 O 
^ O 
O 00 

00 00 

CO O 
K 00 
(N 00 
lo 



K 










CS 


10 


l\ 


CS 




CM 


OO 


to 


00 


cr> 




a: 








-5 




0 






CM 
i/i 


UW 


Mfl 









K 






00 


CO 


00 


cn 






an 


0^ 






0 













D 

c CO 

o5 O 



SCIENTIFIC REPORTS 



I 4 : 4904 | 



DOI: 10.1038/srep04904 



3 



25 

§ 
9 

« 



20 



15 



10 



e 

S 

« 



20 
18 
16 
14 
12 
10 
8 



6 
4 
2 
0 



P=1.05 X 10-12 



111 



h 



123456789 10 

UWGRS5 
P=1.02 X 10 " 




tl 



S 
o 
s 



s 



P=1.05 X 10 i» 



■ Controls 

■ IgAN 




D 



9 10 11 12 13 14 

UWGRS7 



P=1.43 X 10 » 




5 5 7 8 9 10 11 12 13 14 15 16 17 18 

UWGRS9 

Figure 2 | Distribution of unweighted genetic risk score (uwGRS) between IgAN and controls. (A) uwGRSS, (B) uwGRS7, (C) uwGRS9, (D) uwGRS4. 
The p values indicate comparison of cases and controls using a chi-squared test. 



of hypertension (Table 2). However, the effect size conferred by the 
risk genotype was only moderate, and none of the associations 
survived the multiple-testing correction. 

Observed GRS in IgAN and controls. We constructed four different 
genetic scores involving different combinations of IgAN alleles, 
including GRS5 (five reported HLA alleles), GRS7 (five reported 
HLA alleles and two non-HLA alleles, which were the same as 
reported standardized GRS), GRS9 (five reported HLA alleles and 
four non-HLA alleles), and GRS4 (four non-HLA alleles). Every 
score could be weighted or un-weighted. For comparison, we also 
directly adopted standardized GRS as reported previously. 

The distribution of unweighted GRSs (uwGRSs) between IgAN 
and controls were significantly different (Figure 2). The frequency of 
a higher uwGRS (more risk alleles) was higher in IgAN than in 
controls. With every 1-unit increase in the uwGRS or one copy 
increase of a risk allele, the disease risk increased by about 20% ~ 
30% (Table 3). Using the difference value (differences of uwGRS 
between IgAN and controls, differences value = uwGRSigAN ^ 
uwGRScontroi) ^s a rlsk function, the difference value of uwGRSS, 
uwGRS7, or uwGRS9 was much further from zero than that of 
uwGRS4 (non-HLA risk score). This might suggest that IgAN cases 
had one more copy of a risk allele than the controls, which was 
mainly from the HLA alleles. 

The data from the weighted GRS (wGRS) model (the risk score 
equations are shown in Table 4) was concordant with that from 



unweighted models. With one standard deviation increase in the 
score, disease risk increased about 40% ~ 60%. The OR for one 
standard deviation increase were 1.47, 1.60, 1.63, 1.42 and 1.68 for 
wGRS5 (OR = 1.47, 95% CI: 1.34-1.61, P = 8.83 X lO"'-), wGRS7 
(OR = 1.60, 95% CI: 1.45-1.76, P = 7.36 X 10"^^), wGRS9 (OR = 
1.63, 95% CI: 1.48-1.80, P = 5.66 X 10"^"), wGRS4 (OR = 1.42, 95% 
CI: 1.30-1.56, P = 9.58 X lO"'") and standardized GRS (OR = 1.68, 
95% CI: 1.53-1.84, P = 9.42 X 10"^'), respectively. Examination of 
wGRS quartUes also suggested a pattern of increasing disease risk 
with each wGRS quartile. Using group 1 (lowest level of risk) as a 
reference group, quartile 4 had the highest odds of IgAN, with ORs of 
2.37, 3.17, 3.34, 2.28, and 3.67 for wGRS5, wGRS7, wGRS9, wGRS4 
and standardized GRS^^, respectively. The trends across all categories 
were highly significant without restriction of the wGRS adopted 
(Table 5). 

Observed GRS and clinical parameters of IgAN. We assessed the 
associations between the clinical parameters of IgAN, including 
proteinuria, hematuria, eGFR, hypertension, hyperlipidemia, 
hyperuricemia, CKD stage and Hass grade at the time of renal 
biopsy with cumulative genetic effects of identified SNPs from 
GWAS (Table 2). However, no clear associations were observed, 
except a marginally significant association between GRS4 and 
gross hematuria (p < 0.05). Consistent with data from individual 
association between single SNPs and clinical parameters of IgAN, 
significant associations between IgA and IgAl levels with GRS were 
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Table 5 | Risk of susceptibility to IgAN based 


on quartiles of wGRS 
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-0.85 


224(24.9) 


1 












Q2 


0.15 


228(19.2) 


0.1 1 


216(24.0) 


1.78 


1.34-2.36 


6.74 X 10-^ 








Q3 


0.78 


302(25.4) 


0.75 


217(24.1) 


2.34 


1.78-3.09 


1.14 X 10-' 








Q4 


1.57 


527(44.3) 


1.48 


242(26.9) 


3.67 


3.82-4.77 


3.57 X 10-2^ 


3.54 X 10- 


24 


We calculated the odds for the 


top group [Q4) 


compared with the 


bottom group [Ql ) 


as the reference c 


roup. 













observed; the sera IgA level increased with increasing uwGRS or 
wGRS. The associations were more prominent considering GRSs 
that included non-HLA alleles (GRS4, GRS7 and GRS9 instead of 
GRSS), suggesting that the effect was mainly driven by non-HLA 
alleles. However, the associations became non-significant on 
multiple correction. 

Association between genetic information and prognosis of IgAN. 

rs3803800 and GRS4 were marginally associated with indicators for 
prognosis, including natural log-transformed time averaged mean 
arterial pressure and eGFR slope in linear regression (Table 6). By 
univariate Cox regression analysis, GRSS, GRS7 and GRS9 were 
associated with disease progression to end stage renal disease 
(ESRD), in which uwGRS showed a minimal increase of sensitivity 
for association. Although it seemed that the relative risks were 



similar, uwGRS9 showed the most significant association with 
progression to ESRD. 

Consistent with previous reports, the statistics confirmed good 
discrimination between IgAN and controls regarding GRSs (AUC 
was about 0.6, p < 0.001)XTable 7), in which GRS9 and standardized 
GRSs showed the better fit in model prediction. Using the Kaplan- 
Meier survival method with the optimal derived cut-off value (16 for 
uwGRS9 with a sensitivity 0.96 and specificity 0.93) identified by a 
receiver operator characteristic (ROC) curve, we observed a worse 
renal prognosis rate of 26.3% (Figure 3, p = 7.91 X 10"') only in 
IgAN patients with uwGRS s 16 at 10 years ESRD, compared with 
12.1% in uwGRS < 16. When covariates of ACEI/ARB use and 
steroid use (yes or no) were introduced into multivariate Cox regres- 
sion analysis, uwGRS9 a 16 was still an independent predictor for 
ESRD in IgAN. The relative risks for uwGRS9 > 16, ACEI/ARB use. 



Table 6 | Correlation of the SNPs and GRS with prognosis of IgAN in follow-up 



Genetic information 


Log (TA-Proteinuria) 


Log (TA-AMP) 


Slope 


ESRD 


rs6677604 


0.14 


0.49 


0.20 


0.80 


rs9275224 


0.93 


0.78 


0.40 


0.1 1 


rs2856717 


0.62 


0.66 


0.23 


0.16 


rs9275596 


0.86 


0.38 


0.13 


0.10 


rs9357155 


0.31 


3.31 X 10-"" (Beta 0.13) 


0.79 


0.92 


rs 1883414 


0.29 


0.21 


0.17 


0.14 


rs2738048 


0.50 


9.71 X 10 "(Beta -0.19) 
1.56 X 10 ''(Beta -0.15) 


0.63 


0.20 


rs3803800 


0.37 


4.25 X 10 ''(Beta 0.1 2) 


0.76 


rs24 12971 


0.71 


0.79 


0.18 


0.29 


uwGRS5 


0.64 


0.94 


0.1 1 


4.81 X lO-'' (Beta 0.1 8) 
3.04 X lO-'' (Beta 0.19) 
1.67 X lO-'' (Beta 0.1 7) 


uwGRS7 


0.70 


0.92 


0.37 


uwGRS9 


0.90 


0.1 1 


0.98 


uwGRS4 


0.32 


3.84 X 10 ''(Beta -0.16) 


1.35 X 10 ''(Beta 0.14) 


0.15 


wGRS5 


0.72 


0.89 


0.12 


0.05 


wGRS7 


0.85 


0.95 


0.35 


3.96 X lO-'' (Beta 0.44) 
2.51 X 10 ''(Beta 0.46) 


wGRS9 


0.98 


0.47 


0.55 


wGRS4 


0.32 


0.10 


1.80 X 10 ''(BetaO.13) 


0.20 


Standardized GRS 


0.65 


0.89 


0.89 


2.25 X 10 " (Beta 0.48) 



Linear regression was applied for the correlation analysis of natural log-transformed time-average proteinuria, natural log-transformed time-average mean artery pressure and eGFR slope. 
Univariate Cox regression analysis was applied for the association of disease progression v/ith ESRD. 
Effect estimates [OR/BETA) are shown only for significant associations (p < 0.05). 
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Table 7 | Comparison of different genetic risk scores in disease 


prediction 








Osneric risk score 


K 




P 


uwGRS5 


4.5% 


0.61 (0.58-0.63) 


1.77 X 10~'* 


uwGRSZ 


6.4% 


0.62 (0.60-0.65) 


2.97 X 10"22 


uwGRS9 


7.3% 


0.63 (0.61-0.66) 


6.83 X 10-" 


uwGRS4 


3.0% 


0.59 (0.56-0.61) 


3.28 X 1 0 


wGRS5 


4.2% 


0.61 (0.58-0.63) 


1.41 X 10-'* 


wGRS7 


6.1% 


0.62 (0.60-0.65) 


4.35 X 10-2^ 


wGRS9 


6.8% 


0.63 (0.61-0.66) 


6.83 X 10-" 


wGRS4 


3.7% 


0.59 (0.56-0.61) 


3.28 X 10-" 


Standardized GRS 


7.7% 


0.64 (0.61-0.66) 


1.72 X 10-^<* 


As reported, the percentage of the total variance in the disease state expla 


ned by the risk score 


was estimated by Nagelkerk 


e's pseudo 


from the logistic regression model, with the risk score as 


a quantitative predictor and disease state as an outcome. 




The C-statistic was estimated 


as an area ur 


der the receiver operating characteristic curve provided 


by the above logistic model. 









and steroid use were 2.52 (95% CI, 1.29-4.91, p = 6.68 X 10"'), 0.09 
(95% CI, 0.04-0.23, p = 2.85 X lO""), and 3.75 (95% CI, 1.90-7.41, p 
= 1.42 X 10"''). Regarding clinical parameters at the time of disease 
onset, including blood pressure, hematuria, proteinuria and renal 



pathology, the uwGRS9 s 16 group also showed no significant dif- 
ference compared with the uwGRS9 < 16 group (p > 0.05). 
Similarly, using standardized GRS s mean + SD as the cut-off value, 
a marginally significant 10-year ESRD rate was observed in IgAN 
patients with standardized GRS s mean + SD compared with that of 
standardized GRS < mean + SD (21.3% vs. 12.1%, p = 0.06). 

Discussion 

We tested previously established SNPs associated with IgAN in a 
large collection of Chinese patients. Although the two SNPs from a 
southern China GWAS were marginally associated with IgAN in our 
cohort, we validated that all nine SNPs could be replicated as assoc- 
iated with IgAN, suggesting their real genetic effect'^'^'. 

To determine their cumulative effect, we constructed two genetic 
risk scores^**, uwGRS and wGRS, with different portfolios of different 
SNPs, HLA alleles or non-HLA alleles or in combination, for IgAN 
and tested their correlation to ESRD events and their potential for 
disease prediction. Compared with single SNP, GRSs were more 
significantly associated with susceptibility to IgAN. All GRSs were 
associated with IgAN, in which the prediction power increased with 
increasing numbers of SNPs selected. However, IgAN cases could 
have one more copy of a risk allele compared with controls, mainly 
HLA alleles. Although a non-HLA uwGRS model (uwGRS4) could 



Survival Functions 
Log rank test P =7.91 X 10 ^ 



1.0- 



0.8- 




_r uwGRS9>16 
_r uwGRS9<16 



> 0.6- 

E 

3 0.4-1 

o 



0.2- 




0.0- 



20 



40 



60 



"IT" 
80 



100 



— I — 
120 



No. of risk 



Follow-up time 



— I — 

140 



160 



180 



uwGRS9>16 57 53 52 51 46 44 42 

(73.7%) 

UWGRS9<16 240 233 223 216 214 213 211 211 211 

(87.9%) 

Figure 3 | Kaplan-Meier survival curves without ESRD/dialysis/death event, with time zero set at kidney biopsy and uwGRS9 < 2 in IgAN patients. 

Using the Kaplan-Meier survival method with the optimal derived cut-off values, we observed a worse renal prognosis rate of 26.3% only in IgAN patients 
with uwGRS9 a 16 at 10 years ESRD, compared with 12.1% with uwGRS9 < 16. 
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also discriminate IgAN from controls, the difference of uwGRS count 
between IgAN and controls were smaller than that of the HLA GRS 
model (uwGRSS). Compared with the HLA GRS model (GRS5), with 
increase of non-HLA alleles (GRS7 and GRS9), the prediction power 
increased only slightly. This suggested that HLA aUele-based GRSs 
might have larger power in disease prediction than non-HLA aUele- 
based GRSs. The other issue was the GRS calculation method^''"^^. 
Similar to previous reports^'"^'', we did not observe a highly signifi- 
cant difference between the two methods (uwGRS and wGRS) in 
disease prediction power, as shown by slightly different AUG (dif- 
ference < 0.05) and Nagelkerke's pseudo R^ (difference < 0.01) 
scores using the same number of risk alleles. The data was consistent 
with previous reports: it mattered little in terms of discriminative 
accuracy whether genetic scores were constructed using the count 
method or the log odds procedure for most complex diseases with 
ORs for disease risk alleles similar and close to P'"^'. uwGRS showed 
slightly lower p values compared with wGRS, suggesting that uwGRS 
might be chosen for risk stratification in IgAN. However, it may not 
be true for other disease. The ORs for IgAN risk alleles were similar 
and close to 1(<1.5); therefore, the weighted index seemed to have a 
marginal effect. If great discrepancies of ORs for IgAN risk alleles and 
risk alleles with larger effects were identified, the strategy may need 
change"'-^''"^^. However, the seven-SNP genetic risk score showed a 
better fit in disease prediction than GRS7 and, occasionally, GRS9. 
There are several possible explanations: it was based on more sam- 
ples (5 times larger than ours); it included a second kind of genetic 
interaction information; and the model was constructed using a 
stepwise logistic regression algorithm. Future evaluation of the 
seven-SNP genetic risk score in sub-phenotypes and disease pro- 
gnosis in more widespread populations are still warranted. 

For clinical parameter or sub-phenotype associations, we observed 
highly significant associations between GRS and serum IgA/IgAl 
levels. The risk genetic group was consistently associated with 
increased IgAl level. The data also validated associations between 
rs3803800 and serum IgA level, noted in a previous GWAS of IgA 
level conducted in a Chinese population'^ ""*. Our data supported the 
notion that genetically deregulated IgA play a key role in the patho- 
genesis in IgAN"**''""'^'. However, less concordant or significant 
associations for other sub-phenotypes of IgAN were observed as 
associated with single or cumulative gene effects. 

When the weak effects of the individual SNPs are considered 
together, we observed a strong and consistent effect on ESRD because 
of the GRS. The effect was independent of therapy with ACEI/ARB 
and corticosteroids. Consistent with a recent GRS study conducted in 
hypertension, which suggested that a blood pressure genetic risk 
score could be a significant predictor of incident cardiovascular 
events, the current data may further support the idea of prospects 
for genetic risk prediction in clinical practice""'"'''. We speculated 
that the genetic variants have cumulative effects on IgA deregulation 
involved in disease susceptibility and progression. Although power 
analysis indicated that we had about a 0.6-0.8 power to detect a two- 
fold increased risk considering clinical parameters and disease 
progression, assuming an a-level of 0.05 and allele frequency of 
10%-30% (http://biostat.mc.vanderbilt.edu/PowerSampleSize), the 
effect size identified was far smaller than two and all the associations 
did not survive multiple testing. Thus, the data requires further 
widespread replications and functional investigations. 

The strengths of the current study include the large sample size, 
the availability of complete genetic information and relevant covari- 
ates, the comparatively long follow-up period with a certain number 
of ESRD outcomes available for prospective analyses and adoptions 
of different GRS methods. Limitations include its single center 
experience, and the inability to generalize to Southern Chinese 
Han and non-Chinese ancestry groups. The current GRS modeling 
was mainly based on genotyping data from a previous GWAS cohort; 
therefore, we cannot rule out the possibility of bias of GRS from over- 



fitted association, requiring future evaluation of GRS in more wide- 
spread cohorts and in prospective studies. We lacked the ability to 
adjust for time-varying clinical factors in disease progression. The 
proportion of variation explained by the SNPs remained low, and the 
level of prediction for events was also relatively small. We lacked 
power to demonstrate associations with moderate genetic effects 
reported in a previous Southern Chinese Han GWAS. 

In conclusion, we observed that GRSs comprising nine SNPs iden- 
tified in a GWAS of IgAN were strongly associated with susceptibility 
to IgAN, in which HLA alleles contributed more than non-HLA 
alleles and uwGRS calculation was simpler than wGRS for predic- 
tion. The high risk GRS9 group (uwGRS9 s 16) had a high risk 
of ESRD in follow-up, suggesting a need for early and positive 
intervention. 

Methods 

Study population. The case-control cohort analyzed in this study was the same as the 
previous Chinese Han cohort included in the GWAS"^^: 1,194 IgAN cases and 902 
heahhy controls recruited in the renal division of Peking University First hospital. 
Quality control was performed as described^^. All cases carried a biopsy diagnosis of 
IgAN defined by typical light microscopy features and predominant IgA staining on 
kidney tissue immunofluorescence, in the absence of liver disease, vasculitis, 
Henoch-Schoenlein purpura, or other autoimmune diseases. This investigation was 
conducted according to the Declaration of Helsinki. All subjects provided informed 
consent to participate in genetic studies and the ethic review committee of Peking 
University First Hospital approved the study protocol. 

Baseline and follow-up clinical phenotypes. Detailed phenotypic data from the 
patients, including degree of renal dysfunction, hematuria, and proteinuria at 
presentation, total serum IgA, and detailed biopsy findings (Haas staging), were 
collected at the time of renal biopsy at enrollment. Among the patients involved in the 
GWAS, 297 patients were followed for a mean of 5 years {range 1 to 15 years). An 
enzyme-linked immunosorbent assay quantified Serum IgA and IgAl^. All patients 
received the same therapy regimen, including optimal blood pressure control target to 
less than 130/80 mmHg, RAS inhibition and steroids or other immunosuppressive 
agents for patients with persistent proteinuria. The blood pressure and proteinuria 
controls were expressed as time-average mean artery pressure or time-average 
proteinuria. The endpoint in this study was defmed by diagnosis of ESRD or death. 
ESRD was defined as eGFR < 15 ml/min/1.73 m"^ or need for renal replacement 
therapy (hemodialysis, peritoneal dialysis or renal transplantation). The eGFR was 
calculated using the Modification of Diet in Renal Disease (MDRD) formula'"'''". 

SNP selection. We firstly selected seven SNPs (Table 1), including five HLA SNPs 
and two non-HLA SNPs at five independent loci, which were independently 
associated with IgAN in the GWAS and they were selected in the GRS calculated in 
the previous report^'"*^.Another large GWAS conducted in a different Chinese Han 
population from Southern China identified additional IgAN associated non-HLA 
alleles; therefore, they were also selected for the current study. The additional non- 
HLA SNPs were rs2738048 (8p23), rs3803800 (17pl3), rs4227 (17pl3), and rsl2537 
(22ql2)'l As the D' between rs3803800 and rs4227 was 0.92, and that between 
rs3803800 and rs4227 was 0.91, indicating high linkage disequilibrium and possibly 
non-independent genetic effects, a seven-SNP model at the five independent loci and 
a nine-SNP model at the seven independent loci were constructed. The nine-SNP 
model included the novel IgAN associated variant rs2738048 and the missense 
variant rs3803800. 

Genetic risk score. Two GRSs were constructed on an a priori basis. The first GRS 
using an unweighted approach (uwGRS) was the simple counts of the total number of 
risk alleles rather than weighting by the effect of each SNP, as the current data 
available may be insufficient to provide stable estimates for each effect of small 
magnitude^''. The second GRS was the weighted-GRS (wGRS) that utilized the allelic 
odds ratios (OR) to account for the strength of the genetic association within each 
allele, because different IgAN alleles may have different odds ratios. The wGRS was 
the weighted sum of risk allele counts, where the weight for each SNP was the natural 
log of the OR"*^'"^*'. Different ORs may be observed in different populations for the same 
allele; therefore, we adopted ORs observed in our current dataset. For comparison and 
cross-validation, we also directly calculated standardized genetic risk based on the 
seven SNPs associated with IgA nephropathy in the previous analysis of 10,755 
individuals from 12 international case-control cohorts. A coded allele is an allele 
coded 0, 1, or 2 according to the number of copies of the target allele, as reported^'". 
Individuals with 100% non-missing genotypes across all the scored loci were 
analyzed. Ultimately, 1190 cases and 899 controls were included in the current study. 

Statistical analysis. We used logistic regression to study the association of each allele 
with the risk of IgAN, according to an additive log-odds model. We calculated a GRS5 
that included five HLA alleles, a GRS4 that included four non-HLA alleles, a GRS7 
that included five HLA alleles and two non-HLA risk alleles, and a GRS9 that included 
five HLA alleles and four non-HLA risk alleles. 



SCIENTIFIC REPORTS | 4 : 4904 | DDI: 1 0. 1 038/srep04904 



8 



The difference in the distribution of uwGRSs between IgAN cases and controls was 
tested using the chi- squared test. To explore the observed patterns in more detail, we 
also divided the subjects into quartiles based on the GRS of controls, and computed 
the proportions of cases and controls in each quartile. To assess whether risk was 
significantly different according to quartile, we performed logistic regressions that 
modeled the risk of disease as a function of each GRS quartile compared with the 
reference quartile. Finally, we calculated the odds for the top group (group 4) com- 
pared with the bottom group (group 1) as the referent group^^'^^. 

Linear regression was applied for correlation analysis of natural log-transformed 
serum IgA levels, natural log-transformed proteinuria and natural log-transformed 
eGFR. Binary logistic regression was carried out for the correlation analysis of history 
of gross hematuria. Ordinal logistic regression was performed for the correlation 
analysis of clinical subtype, microscopic hematuria, CKD stage at the time of biopsy, 
and Hass biopsy grade^^'^^. 

To set the cut-off values between patients and controls, and within the cohort of 
IgAN patients grouped as progressive cases versus non- progressive cases, we used 
ROC curve analyses to find the best compromise value between sensitivity and spe- 
cificity; we also generated ROC curves by plotting the sensitivity of the GRS score 
against 1 -specificity and calculated the area under the curve (AUG). As reported, the 
percentage of the total variance in the disease state explained by the risk score was 
estimated by Nagelkerke's pseudo R^ from the logistic regression model, with the risk 
score as a quantitative predictor and disease state as an outcome. The G-statistic was 
estimated as an area under the receiver operating characteristic curve provided by the 
above logistic model. The AUG statistics were compared using a non-parametric 
approach, as described previously*. The Kaplan-Meier survival method and Cox 
proportional hazards models were used to generate estimates of predicted risk of 
ESRD. 

Descriptive statistics included mean (SD) and median (with range values).These 
analyses were carried out with SPSS Statistics version 16.0. 
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